Fastfood restaurants by cuisine type
Fastfood restaurants are identified by their cuisine descriptions given in the inspection data. We print out the cuisine descriptions list (n=85) and let everyone circle the ones they think is fastfood and the union are used as our rule (use union because it’s more conservative).
We classify cuisine descriptions “Bagels/Pretzels”, “Bottled beverages, including water, sodas, juices, etc.”, “Chicken”, “Delicatessen”, “Donuts”, “Hamburgers”, “Hotdogs”, “Hotdogs/Pretzels”, “Ice Cream, Gelato, Yogurt, Ices”, “Nuts/Confectionary”, “Pancakes/Waffles”, “Pizza”, “Soul Food”, “Sandwiches”, “Sandwiches/Salads/Mixed Buffet” and “Soups & Sandwiches” as fastfood restaurants. And then we calculate the total number of restaurants and the number of fastfood restaurants, as well as the percentage of fastfood restaurants for each neighborhood.
# calculating the total number of restaurants and the number of fastfood restaurants in the neighborhood, as well as the percentage of fastfood restaurants.
neighborhood_list =
rest_neighborhood %>%
distinct(neighborhood) %>%
arrange(neighborhood)
rest_fastfood_neighborhood =
rest_neighborhood %>%
filter(cuisine_description %in% c("Bagels/Pretzels",
"Bottled beverages, including water, sodas, juices, etc.",
"Chicken",
"Delicatessen",
"Donuts",
"Hamburgers",
"Hotdogs",
"Hotdogs/Pretzels",
"Ice Cream, Gelato, Yogurt, Ices",
"Nuts/Confectionary",
"Pancakes/Waffles",
"Pizza",
"Soul Food",
"Sandwiches",
"Sandwiches/Salads/Mixed Buffet",
"Soups & Sandwiches"))
percent_fastfood_neighborhood = function(name_neighborhood){
rest_each_neighborhood =
rest_neighborhood %>%
filter(neighborhood == name_neighborhood) %>%
distinct(camis)
n_rest_neighborhood = nrow(rest_each_neighborhood)
rest_fastfood_distinct_neighborhood =
rest_fastfood_neighborhood %>%
filter(neighborhood == name_neighborhood) %>%
distinct(camis, cuisine_description)
n_fastfood_neighborhood = nrow(rest_fastfood_distinct_neighborhood)
percent_fastfood_neighborhood = n_fastfood_neighborhood/n_rest_neighborhood
tibble(
neighborhood = name_neighborhood,
n_fastfood = n_fastfood_neighborhood,
n_rest = n_rest_neighborhood,
percent_fastfood = percent_fastfood_neighborhood
)
}
fastfood_neighborhood =
map(neighborhood_list$neighborhood, percent_fastfood_neighborhood) %>%
bind_rows() %>%
mutate(neighborhood = str_to_upper(neighborhood))
# plot for each neighborhood
fastfood_neighborhood %>%
mutate(neighborhood = as.factor(neighborhood),
n_rest = as.numeric(n_rest),
n_nonfastfood = n_rest - n_fastfood,
neighborhood = fct_reorder(neighborhood, percent_fastfood)) %>%
plot_ly(., x = ~neighborhood, y = ~n_fastfood, type = 'bar', name = 'fastfood restaurants') %>%
add_trace(y = ~n_nonfastfood, name = 'non-fastfood restaurants') %>%
layout(yaxis = list(title = 'Number of restaurants'),
xaxis = list(title = 'Neighborhood (ordered by percentage of fastfood restaurants)',
showticklabels = FALSE),
barmode = 'stack')
From the plot, we can see that, while the Greenwich Village and SOHO neighborhood has fairly large number of restaurants, it has the smallest percentage of fastfood restaurants. Williamsbridge and Baychester has the largest percentage of fastfood restaurants.
When large number of total restaurants is not equal to large percentage of fastfood restaurants in that neighborhood, we can conclude that the distribution of fastfood restaurants is not even across neighborhoods, which also implies the motivation of our study, we want to investigate if this uneven distribution of fastfood restaurants is associated with diffferent level of chronic disease outcomes within a neighborhood.
Restaurant Chains
We first scrape the list of 75 national chain restaurants in the US from the wikipedia page (https://en.wikipedia.org/wiki/List_of_restaurant_chains_in_the_United_States#Fast-casual) and then join this dataset with restaurant inspection data to choose only the chain restaurants in NYC.
chains_html = read_html("https://en.wikipedia.org/wiki/List_of_restaurant_chains_in_the_United_States#Fast-casual")
# read in the list of chain restaurants in us
# made the names to uppercase and changed the var name to dba
chain_rest = chains_html %>%
html_nodes("ul:nth-child(9) li , .jquery-tablesorter tr:nth-child(1) td:nth-child(1)") %>%
html_text() %>%
as.tibble() %>%
mutate(dba = value,
dba = str_to_upper(dba)) %>%
select(dba)
head(chain_rest, 10)
## # A tibble: 10 x 1
## dba
## <chr>
## 1 A&W RESTAURANTS
## 2 APPLEBEE'S
## 3 BAJA FRESH
## 4 BOSTON MARKET
## 5 BUFFALO WILD WINGS
## 6 CHILI'S
## 7 CHIPOTLE MEXICAN GRILL
## 8 CICI'S PIZZA
## 9 COLD STONE CREAMERY
## 10 CORNER BAKERY CAFE
Then, we match the list of chain restaurants in U.S. with the restaurant inspection data.
# removing punctuations in chain_rest & restaurant inspections (neighborhoods)
chain_rest_str =
chain_rest %>%
mutate(dba = str_replace_all(dba, "[[:punct:]]", ""))
rest_neigh_str = rest_neighborhood %>%
mutate(dba = str_replace_all(dba, "[[:punct:]]", ""))
# Matching the two datasets(restaurant inspection data that has all punctuation removed from dba(restaurant name) and list of chain restaurants by dba)
neighborhood_chain =
right_join(rest_neigh_str, chain_rest_str) %>%
filter(!is.na(camis)) %>%
distinct(camis, dba, neighborhood, boro)
## Joining, by = "dba"
neighborhood_chain %>%
group_by(dba) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
head(10)
## # A tibble: 10 x 2
## dba n
## <chr> <int>
## 1 DUNKIN DONUTS 454
## 2 SUBWAY 364
## 3 STARBUCKS 307
## 4 MCDONALDS 216
## 5 CHIPOTLE MEXICAN GRILL 78
## 6 WENDYS 46
## 7 APPLEBEES 28
## 8 BOSTON MARKET 24
## 9 WHITE CASTLE 23
## 10 PIZZA HUT 21
The combined dataset “neighborhood_chain” has 1740 observations. Also, there were 33 different chain restaurants extracted.
# counting chains in neighborhoods
neigh_count_chain = neighborhood_chain %>%
group_by(neighborhood, boro) %>%
summarise(chain_n = n())
neigh_count_rest = rest_neighborhood %>%
distinct(neighborhood, camis) %>%
group_by(neighborhood) %>%
summarise(res_n = n())
# calculating percentage of chains in each neighborhood
percent_neighborhood_chain = left_join(neigh_count_chain, neigh_count_rest) %>%
ungroup() %>%
mutate(chain_percentage = chain_n/res_n,
neighborhood = str_to_upper(neighborhood))
## Joining, by = "neighborhood"
plot_chain_neighbor = percent_neighborhood_chain %>%
mutate(neighborhood = forcats::fct_reorder(neighborhood, chain_percentage)) %>%
ggplot(aes(neighborhood, chain_percentage, fill = boro)) + geom_bar(stat="identity") +
labs(x = "Neighborhoods in NYC", y = "Percentage of chain restaurants") +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
ggplotly(plot_chain_neighbor)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
max(percent_neighborhood_chain$chain_percentage)
## [1] 0.1654676
We plot neighborhoods in NYC with their percentage of chain restaurants and group them by borough. We can see that neighborhoods with those smallest percentages of chain restuarants are mostly in Brooklyn except for Greenwhich Village and Soho and Lower East Side and Chinatown in Manhattan. Neighborhoods in Queens and Manhattan are spread out across low to high percentage of chain restaurants while most of the neighborhoods in Bronx and Staten Island have high percentages. The neighborhood with the highest percentage of chain restaurants is Throgs neck and Co-op City in Bronx with 16.5% of chain restaurats out of all restaurats.
Inspection Grade
gradea_neighborhood =
rest_neighborhood %>%
group_by(neighborhood, grade) %>%
summarise(n = n()) %>%
mutate(grade_percent = n / sum(n)) %>%
filter(grade == "A") %>%
ungroup(boro) %>%
mutate(neighborhood = str_to_upper(neighborhood))
# plot for neighborhood
gradea_neighborhood %>%
mutate(neighborhood = as.factor(neighborhood),
neighborhood = fct_reorder(neighborhood,grade_percent)) %>%
plot_ly(x = ~neighborhood, y = ~grade_percent, color = ~neighborhood, type = "bar") %>%
layout(yaxis = list(title = 'Percentage of Grade A restaurants'),
xaxis = list(title = 'Neighborhoods in NYC', showticklabels = FALSE),
showlegend = FALSE)
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
max()
## Warning in max(): no non-missing arguments to max; returning -Inf
## [1] -Inf
The differences on percentage of “grade-A” restaurants between each neighborhood are observed. “Throgs Neck and Co-op City” has the greatest grade-A restaurant percentage, around 44.5%. “Sunset Park”, however, has the least, around 30.6%. “Washington Heights”, where we live, takes the fourth counting from the end, around 34%, which is obviously consistent with the feeling we have towards the restaurant condition of “Washington Heights”…